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(54) Utie: METHODS FOR DRUG SCREENING 



(57) Abstract 

Methods and compositions for estimating the physiological specificity of a candidate drug involve: (a) detecting reporter gene product 
signals from each of a plurality of different, separately isolated cells of a target organism, wherein each ceM contauis a rccombmant construct 
comprising a reporter gene operatively linked to a different endogenous transcriptional regulatory element of the target organism such mat 
the transcriptional regulatory element regulates die expression of the reporter gene, and the sum of the cells comprises an ensemble of the 
transcriptional regulatory elements of die organism sufficient to model the transcriptional responsiveness of said organism to a drug; (b) 
contacting each cell with a candidate drug; (c) detecting reporter gene product signals from each cell; (d) comparing reporter gene product 
signals from each cell before and after contacting die cell wifli die candidate drug to obtain a drug response profile which provides an 
estimate of the physiological specificity of the candidate drug. 
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Meth dsf r Drug Screening 
BACKGROUND 

The field of the invention is pharmaceutical drug screening* Pharmaceutical research and 
development is a multibillion dollar industry. Much of these resources are consumed in e^orts 
to focus the specificity of lead compounds. In addition, many programs are aborted after decades 
of costly yet finitless efforts to limit side effects or toxicity of candidate drugs. Accordingly, tools 
that can abbreviate the research and discovery phase of drug development are desirable. Several 
in vitro or cell culture-based methods have been described for identifying compoimds with a 
particular biological effect through the activation of a linked reporter. Gadski et al. (1992) EP 
92304902.7 describes methods for identifying substances which regulate the synthesis of an 
s^lipoprotein; Evans et al. (1991) US Patent No. 4,981,784 describes methods for identifying 
ligand for a receptor and Farr et al. (1994) WO 94/17208 describes methods and kits utilizing 
stress promoters to determine toxicity of a compound. 

In general, the principle that has been applied in the existing pharmaceutical industry for 
the discovery and development df new lead compounds for drags has been the establishment of 
sensitive and reliable in vitro assays for purified enzymes, and then screening large numbers of 
compounds and culture supematants for any ability to inhibit enzyme activity. The present 
invention exploits the recent advances in genome science to provide for the rapid screening of 
large numbers of compounds against a systemic target comprising substantially all targets in a 
pathway, organism, etc. for rare conq)ounds having the ability to inhibit the protein of interest 
The invention described herein, in effect, turns the drag discovery process inside out. This 
invention provides information on the mechanism of action of every compound that affects cells, 
regardless of the target In addition, the relative specificity of all lead compounds is immediately 
established. 

SUMMARY OF THE INVENTION 
The invention provides methods and compositions for estimating the physiological 
specificity of a candidate drag. In general, the subject methods involve (a) detecting reporter gene 
product signals fix)m each of a plurality of different, separately isolated cells of a target organism, 
wherein each of said cells contains a recombinant constract conqsrising a reporter gene operatively 
linked to a different endogenous transcriptional regulatory element (e.g. promoter) of said target 
organism such that said transcriptional regulatory element regulates the expression of said 
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reporter gene, wherein said plurality of cells comprises an msemble of the transcriptional 

o 

legtiitoiy dements of said organism siiBd^ to model the transcriptional responsiveness of said 
organism to a dmg; (b) contacting each said cell with a candidate dmg; (c) detecting reporter 
gene product signals from each of said cells; (d) conq)aring said reporter gene product signals 
5 finomeach of said cells before and after contacting each of said cells with said candidate drug to 
obtain a drag response profile; wherein said drag response profile provides an estimate of the 
physiological specificity or biological interactions of said candidate drag. 

DETAILED DESCRIPTION OF THE INVENTION 
The Genome Rcportgr Matrix, 

10 The invention provides methods and compositions for estimating the physiological 

specificity of a candidate drag by modeling the transcriptional responses of the target organism 
with an ensmible of reporters, the expressions of which are regulated by transcription regulatory 
genetic elements derived fiom the genome of the target organism. The ^emble of reporting cells 
coxE^irises as comprehensive a coUection of transcription regulatory genetic elements as is 

IS conveniently available for the targeted organism so as to most accurately model the systemic 
transcriptional response. Suitable ensembles generally comprise thousands of individually 
reporting elements; preferred ensembles are substantially comprehensive, i.e. provide a 
transcriptional response diversity comparable to that of the target organism. Generally, a 
substantially con5)rehensive ensemble requires transcription regulatory genetic elements from at 

20 least a majority of the organism's genes, and preferably includes those of all or nearly all of the 
genes. We term such a substantially comprehensive ensemble a genome reporter matrix. 

It is frequently convenient to use an ensemble or genome rq)orter matrix derived from a 
lower eukaryote or common animal model to obtain preliminary information on drag specificity 

o 

in higher eukaryotes, such as humans. Because yeast, such as Saccharomyces cerevisiae^ is a 
25 bona fide eukaryote, there is substantial conservation of biochemical function between yeast and 
human cells in most pathways, from the sterol biosynthetic pathway to the Ras oncogene. Indeed, 
the absence of many effective antifungal compounds illustrates how difficult it has been to find 
ther^utic targets that would selectively kill fungal but not human cells. One example of a shared 
response pathway is sterol biosynthesis. In human cells, the drag Mevacor Oovastatin) inhibits 
30 HMG-CoA reductase, the key regulatory enzyme of the sterol biosynthetic pathway. As a result, 
the level of a particular regulatory sterol decreases, and the cells respond by increased 
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transcr^tion of the gene encoding the LDL receptor. In yeast, Mevacor also inhibits HMG-CoA 
reductase and lowers the level of a key regulatory sterol. Yeast cells respond in an analogous 
fashion to human cells. However, yeast do not have a gene for the LDL receptor. Instead, the 
^ same eflGsct is measured by increased transcr^tion of the ERGIO gene, which encodes acetoacetyl 

5 CoA thiolase, an enzyme also involved in sterol synthesis. Thus the regulatory response is 
' conserved between yeast and humans, even though the identity of the responding gene is difTerent 

Advantages of the Genome Reporter Matrix as a Vehicle for Pharmaceutical Development. 

The advantages of the subject methods over prior art screening methods may be illustrated 
by examples. Consider the difference between an in vitro assay for HMG-CoA reductase 
10 inhibitors as presently practiced by the pharmaceutical industry, and an assay for inhibitors of 
sterol bio^nthesis as revealed by the ERGIO reporter. In the case of the former, information is 
obtained only for those rare compounds that happen to inhibit this one enzyme. In contrast, in 
the case of the ERGIO reporter, any compound that inhibits nearly any of the approximately 35 
steps in the sterol biosynthetic pathway will, by lowering the level of intracellular sterols, induce 
IS the synthesis of the reporter. Thus, the reporter can detect a much broader range of targets than 
can the purified enzyme, in this case 35 times more than the in vitro assay. 

Dmgs often have side effects that are in part due to the lack of target specificity. 
However, the in vitro assay of HMG-CoA reductase provides no information on the specificity 
of a coiiqx>und. In contrast, a graome reporter matrix reveals the spectrum of other genes in the 
20 genome also affected by the compound. In considering two different compounds both of which 
induce the ERGIO reporter, if one compound affects the expression of S other reporters and a 
second compound affects the expression of 50 other reporters, the first compound is, a priori, 
more likely to have fewer side effects. Because the identity of the reporters is known or 
determinable, information on other affected reporters is informative as to the nature of the side 
25 effect. A panel of reporters can be used to test derivatives of the lead compound to determine 
which of the derivatives have greater specificity than the first compound. 

As another example, consider the case of a compound that does not affect the in vitro 
assay for HMG-CoA reductase nor induces the expression of the ERGIO reporter. In the 
traditional approach to drug discovery, a compound that does not inhibit the target being tested 
^30 provides no usefiil information. However, a compound having any significant effect on a 
biological process generally has some consequence on gene expression. A genome reporter 
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matrix can thus provide two different kinds of infonnation for most conq>omids. In some cases, 
the idfiaititty of reports genes affected by the inhibitor evidences to how the inhibitor functions. 
For example, a compound that induces a cAMP-dependent promoter in yeast may affect the 
activity of the Ras pathway. Even where the conq)ound affects the e^^ression of a set of genes 
5 that do not evidence the action of the coHQx>und, the matrix provides a comprehensive assessment 
of the action of the conpound that can be stored in a database for later analyses. A library of such 
matrix response profiles can be continuously investigated, much as the Spectral Compendiums of 
chemistry are continually referenced in the chemical arts. For example, if the database reveals that 
compound X alters the expression of gene Y, and a paper is published reporting that the 

10 expression of gene Y is sensitive to, for example, the inositol phosphate signaling pathway, 
conopound X is a candidate for modulating the inositol ptosphate signaling pathway. In effect the 
genome reporter matrix is an informational translator that takes information on a gene directly to 
a compound that may already have been found to affect the expression of that gene. This tool 
should dramaticaOy shortra the researdi and discovray phase of drag development, and effectively 

IS leverage the value of the publicly available research portfolio on aU genes. 

In many cases, a drag of interest would work on protein targets whose impact on gene 
expression would not be known a priori. The genome reporter matrix can nevertheless be used 
to estimate which genes would be induced or repressed by the drag. In one embodiment, a 
dominant mutant form of the gene encoding a drag-targeted protein is introduced into all the 

20 strains of the genome reporter matrix and the efiBsct of the dominant mutant, which interferes with 
the gene product's normal function, evaluated for each reporter. This genetic assay informs us 
which genes would be affected by a drag that has a similar mechanism of action. In many cases, 
the drag itself could be used to obtain the same information. However, even if the drag itself 
were not available, genetu:s can be used to predetermine what its response profile would be in the 

25 genome reporter matrix. Furthermore, it is not necessary to know the identity of any of the 
responding genes. Instead, the genetic control with the dominant mutant sorts the genome into 
those genes that respond and those that do not. Hence, if drags that disrapt a given cellular 
function were desired, dominant mutants for such function introduced into the genome reporter 
matrix reveal what response profile to expect for such an agent. 

30 For exan5)le, taxol, a recent advance in potential breast cancer therapies, has been shown 

to interfere with tubulin-based cytoskeletal elements. Hence, a dominant mutant form of tubulin 



4 



wo 97/06277 



PCTAJS96/12956 



provides a response profile informative for breast cancer therapies with similar modes of action 
to taxoL Specifically, a dominant mutant form of tubulin is introduced into all the strains of the 
genome reporter matrix and the effect of this dominant mutant, which interferes with the 
nncrotubule cytoskeleton, evaluated for each rqx)rter. Thus, any new compound that induces the 
S sam response profile as the dominant mbulin mutant would provide a candidate for a taxol*like 
pharmaceutical. 

In addition, the genome reporter matrix can be used to genetically create or model various 
disease states. In this way, pathways present specifically in the disease state can be targeted. For 
exan;)]e, the specific response profile of transforming mutant Ras2'^^^ identifies Ras2^'^ induced 

10 reporters. Here, the matrix, in which each imit contains the Ras2^^^ mutation is used to screen 
for compounds that restoie the response profile to that of the matrix lacking the mutation. 

Though these exanq>les are directed to the development of human then^eutics, 
informative response profiles can often be obtained in nonhuman reporter matrices. Hence, for 
disease causing genes with yeast homologs, even if the function of the gene is not known, a 

IS dominant form of the gene can be introduced into a yeast-based reporter matrix to identify disease 
state q>ecific pathways for targeting. For exanq)le, a reporter matrix comprising the yeast mutant 
Ras2'^^^ provides a discovery vehicle for pathways specific to the himian analog, the oncogene 
Ras2^". 



20 Application of Novel Combinatorial Chemistries with the Genome Reporter Matrix. 

Among the most important advances in drug development have been advances in 
combinatorial synthesis of chmical libraries. In convrational drag screening with purified enzyme 
targets, combinatorial chemistries can often help create new derivatives of a lead coiiq)ound that 
will also inhibit the target enzyme but with some different and desirable property. However, 
25 conventional methods would &il to recognize a molecule having a substantially divergent 
specificity. The genome reporter matrix offers a simple solution to recognizing new specificities 
in combinatorial libraries. Specifically, pools of new compounds are tested as mixtures across the 
matrix. If the pool has any new activity not preset in the original lead compound, new genes are 
affected among the reporters. The identity of that gene provides a guide to the target of the new 
t 30 compound. Furthermore, the matrix offers an added bonus that compensates for a common 
weakness in most chemical syntheses. Specifically, most syntheses produce the desired product 
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in greatest abundance and a collection of other related products as contaminants due to side 
reactions in the synthesis. Traditionally the solution to contaminants is to piuify away from them. 
However, the genome reporter matrix esqiloits the presence of these contaminants. Syntheses can 
be adjusted to make them less specific with a greater number of side reactions and more 
5 contaminants to detranine whether aiq^thing in the total synthesis affects the expression of target 
genes of interest If there is a component of the mixture with the desired activity on a particular 
reporter, that reporter can be used to assay purification of the desired component from the 
mixture. In effect, the reporter matrix allows a focused survey of the effect on single genes to 
compensate for the impurity of the mixture being tested. 

10 Isoprenoids are a specially attractive class for the genome reporter matrix. In nature, 

isoprenoids are the dianq>ion signaling molecules. Isoprenoids are derivatives of the five carbon 
compound isoprene, which is made as an intranediate in cholesterol biosynthesis. Isoprenoids 
include many of the most fimious fiagrances, pigments, and other biologically active conxpounds, 
such as the antifungal sesquiteipenoids, which plants use defensively against fimgal infection. 

IS There are roughly 10,000 characterized isoprene derivatives and many more potential ones. 
Because these conq)ounds are used in nature to signal biological processes, they are likely to 
include some of the best membrane permeant molecules. 

Isoprenes possess another characteristic that lends itself weU to drug discovery through 
the genome reporter matrix. Pure isoprenoid compounds can be chemically treated to create a 

20 wide mixture of different compounds quickly and easily, due to the particular arrangement of 
double bonds in the hydrocarbon chains. In efifect, isoprenoids can be mutagenized from one form 
into many different forms much as a wild-type gene can be mutagenized into many different 
mutants. For ^can^le, vitamin D used to fortify milk is produced by ultraviolet irradiation of the 
iscqxrene derivative known as ergosterol. New biologically active isoprenoids are generated and 

25 analyzed with a genome reporter matrix as follows. First a pure isoprenoid such as limonene is 
tested to determine its response inx>file across the matrix. Next, the isoprenoid (e.g. limonene) 
is chemically altered to create a xnixture of different compounds. This mixture is then tested 
across the matrix, ff any new responses are observed, then the mixture has new biologically active 
species. In addition the identity of the reporter genes provides information regarding what the 

30 new active species does, an activity to be used to monitor its purification, etc. This strategy is 
also applied to other mutable chemical families in addition to isoprenoids. 
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A pplications of the fienome Reporter Matrix in Antibiotic and Antifungal Discovery. 

Fungi are important pathogens on plants and animals and make a major inq)act on the 
production of many food crops and on animal, including human, health. One major difficulty in 
the developm^ of antifungal conqx>unds has been the problem of finding pharmaceutical targets 
5 in fungi that are specific to the fungus. The goiome reporter matrix offers anew tool to solve this 
problem. Spec^cally, all molecules that fail to elicit any response in the Saccharomyces reporter 
are collected into a set, which by definition must be either inactive biologically or have a very 
high specificity. A reporter library is created firom the targeted pathogen such as Cryptococcus, 
Candida, Aspergillus, Pneumocystis etc. AH molecules from the set that do not affect 

10 Saccharomyces are tested on the pathogen, and any molecule that elicits an altered response 
profile in the pathogen in principle identifies a target that is pathogen-specific. As an example, 
a pathogm may have a novel signaling razyme, such as an inositol kinase that alters a position on 
the inositol ring that is not altered in other species. A conq)ound that inhibits that enzyme would 
affect the signaling pathway in the pathogen, and alter a response profile, but due to the absence 

IS of that imzyme in other organisms, would have no effect. By sequencing the reporter genes 
affected specificaify in the target fimgus and conq>aring the sequence with others in Genbank, one 
can identify biochemical pathways that are unique to the target species. Useful identified products 
include not only agents that kill the target fungus but also the identification of specific targets in 
the fungus for other pharmaceutical screening assays. 

20 The identification of compounds that kill bacteria has been successfully pursued by the 

pharmaceutical industry for decades. It is rather simple to spot a compound that kills bacteria in 
a spot test on a petri plate. Unfortunately, growth inhibition screens have provided very limited 
lead conqx>und diversity. However, there is much conplexity to bacterial physiology and ecology 
that could offer an edge to development of combination therapies for bacteria, even for 

25 compounds that do not actually kill the bacterial cell. Consider for example the bacteria that 
invade the urethra and persist there through the elaboration of surface attachments known as 
fimbrae. Antibiotics in the urine stream have limited access to the bactoia because the urine 
stream is short-lived and infirequent. However, if one could block the synthesis of the fimbrae to 
detach the bacteria, existing ther^ies would become more effective. Similarly, if the chemotaxis 
^ 30 mechanism of bacteria were crippled, the ability of bacteria to establish an effective infection 
would, in some species, be conq>romised. A genome reporter matrix for a bacterial pathogen that 
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contains reporters for the expression of genes involved in chemotaxis or fimtarae synthesis, as 
exanq)]es, identifies not only conQX>unds that do kill the bacteria in a spot test, but also thosb that 
interfere with key steps in the biology of the pathogen. These conopounds would be exceedingly 
difficult to discover by conventional means. 
5 Applications of Human Cdl Based Qenomg Reporter Matrices. 

A genome reporter matrix based on human cells provides many in^itant applications. 
For exansplCy an interesting application is the development of antiviral conq>oimds. When himian 
cells are infected by a wide range of viruses, the cells respond in a complex way in which only a 
few of the components have been identified. For example, certain interferons are induced as is 

10 a double-stranded RNase. Both of these responses individually provides some measure of 
protection. A inatrbc that reports the induction ofinterferongCTes and the double stranded 
is able to detect con^unds that could propfaylactically protect cells before the arrival of die virus . 
Othra: protective effects may be induced in paralleL TbeiiKX>rporationof a panel of other reporter 
genes in the matrix is used to identify those compounds with the highest degree of specificity. 

IS Use of the Genome Reporter Matrix. 

The procedure to be foOowed in the subject methods will now be outlined. The initial step 
involves determining the basal or background response profile by detecting reporter gene product 
signals from each of a plurality of different, separately isolated cells of a target organism under 
one or more of a variety of physical conditions, such as temperature and pH, medium, and 

20 osmolarity. As discussed above, the target organism may be a yeast, animal model, hiunan, plant, 
pathogen, etc. Generally, the cells are arranged in a physical matrix such as a microliter plate. 
Each of the cells contains a recombinant construct comprising a reporter gene operatively linked 
to a different endogenous transactional regulatory daaent of said target organism such that said 
transcriptional regulatory element regulates the expression of said reporter gene. A sufBcirat 

.25 nuinber of different recoiribinant cells are included to provide an ensemble of transcriptional 
regulatory elements of said organism sufiBdrat to model the transcriptional responsiveness of said 
organism to a drag. In a preferred ernbodimmt, the matrix is substantially comprehensive for the 
selected regulatory elements, e.g. esseritially all of the gene promoters of the targeted organism 
are included. Other cis-acting or trans-acting transcription regulatory regions of the targeted 

30 organism can also be evaluated. In one embodiment, a genome reporter matrix is constructed 
from a set of lacZ fusions to a substantially comprehensive set of yeast genes. The fusions are 
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preferably constructed in a diploid cell of the a/a mating type to allow the introduction of 
dominant mutations by mating, though haploid strains also find use with particularly sensitive 
reporters for certain functions. The fusions are conveniratly arrayed onto a microtiter plate 
, having 96 wells separating distinct fizsions into wells having defined alphanumeric X-Y 

S coordinates, where each well (defined as a unit) confines a cell or colony of cells having a 
constract of a reporter gene operative^ joined to a different transcriptional promoter. Permanent 
collections of these plates are readily maintained at -80''C and copies of this collection can be 
made and propagated by simple mechanics and may be automated with commercial robotics. 

The methods involve detecting a reporter gene product signal for each cell of the matrix. 

10 A wide variety of reporters may be used, with preferred reporters providing conveniently 
detectable signals (e.g. by spectroscopy). Typically, the signal is a change in one or more 
electromagnetic properties, particularly optical properties at the unit As examples, a reporter 
gene may encode an enzyme which catalyzes a reaction at the unit which alters light absorption 
I»operties at the unit, radiolabeled or fluorescent tag-labeled nucleotides can be incorporated into 

IS nascent transcripts which are then identified when bound to oligonucleotide probes, etc. 
Examples include p-galactosidase, invertase, green fluorescent protein, etc. Invertase fusions 
have the virtue that functional fusions can be selected from complex libraries by the ability of 
invertase to allow those genes whose expression increases or decreases by measuring the relative 
growth on medium containing sucrose with or without the compound of interest. Electronic 

20 detectors for optical, radiative, etc. signals are commercially available, e.g. automated, multi-well 
colorimetric detectors, similar to automated ELISA readers. Reporter gene product signals may 
also be monitored as a function of other variables such as stimulus intmsity or duration, time (for 
dynamic response analyses), etc. 

In a preferred embodiment, the basal response profiles are determined through the 

25 colorimetric detection of a lacZ reaction product. The optical signal generated at each well is 
detected and linearly transduced to generate a corresponding digital electrical output signal, llie 
resultant electrical output signals are stored in computer memoiy as a genome reporter output 
signal matrix data stracture associating each output signal with the coordinates of the 
corresponding microtiter plate well and the stimulus or drag. This information is indexed against 
1 30 the matrix to form reference response profiles that are used to detemtiine the response of each 
reporter to any milieu in which a stimulus may be provided. 
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After establishing a basal response profile for the matrix, ach cell is contacted with a 
candidate drug. The term drug is used loosely to reier to agents which can provoke a specific 
cellular response. Pcefisned drugs are pharmaceutical agents, particularly therapeutic agents. The 
drag induces a complex response pattern of rqnession, silence and induction across the matrix 
5 (i.e. a decrease in reporter activity at some units, an increase at others, and no change at still 
others). The response profile reflects the cell's transcriptional adjustments to maintain 
homeostasis in the presence of the drag. While a wide variety of candidate drags can be 
evaluated, it is important to adjust the incubation conditions (e.g. concentration, time, etc.) to 
prechide cellular stress, and hence insure the measurements of pharmaceutically relevant response 

10 profiles. Hence, the methods monitor transcriptional changes which the cell uses to fnaifitatn 
oeDular homeostasis. CeUular stress may be monitored by any convenient way such as membrane 
potential (e.g. dye exclusion), cellular morphology, expression of stress response genes, etc. In 
a prefmed embodinamt, the conq>ound treatmmt is pecfoimed by transferring a copy of ihe entire 
matrix to ftesh mediiun containing the first compound of interest 

1 5 After contacting the cells with the candidate drag, the reporter gene product signals firom 

each of said cells is again measured to detennine a stimulated response profile. The basal or 
background response profile is then compared with (e.g. subtracted from, or divided into) the 
stimulated response profile to identify the cellular response profile to the candidate drag. The 
cellular response can be characterized in a number of ways. For example, the basal profile can 

20 be subtracted firom the stimulated profile to yield a net stimulation profile. In another 
embodiment, the stimulated profile is divided by the basal profile to yield an induction ratio 
profile. Such comparison profiles provide an estimate of the physiological specificiQr of the 
candidate drag. 

In another embodiment of the invention, a matrix of hybridization probes corresponding 
25 to a predetemiined population of genes of the selected organism is used to specifically detect 
changes in gene transcription which result from exposing die selected organism or cells thereof 
to a candidate drag. In this embodinent, one or more cells derived firom the organism is exposed 
to the candidate drag in vivo or ex vivo under conditions wherein the drag effects a change in 
gene transcrq)tion in the cell to maintain homeostasis. Thereafter, the gene transcripts, primarily 
30 mRNA, of the cell or cells is isolated by conventional means. The isolated transcripts or cDNAs 
conplementary thereto are then contacted with an ordered matrix of hybridization probes, each 
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probe being specific for a differrat one of the traoscripts, under conditions wherein each of the 
transcripts hybridizes with a corresponding one of the probes to form hybridization pairs. The 
ordered matrix of probes provides, in aggregate, complements for an ensraoble of genes of the 
organism sufficient to model the transcriptional responsiveness of the organism to a drug. The 
S probes are generally immobilized and arrayed onto a solid substrate such as a microtiter plate. 
Specific Ityhridization may be effected, for exan5>le, by washing the hybridized matrix with excess 
non-specific oligonucleotides. A hybridization signal is then detected at each hybridization pair 
to obtain a matrix-wide signal profile. A wide variety of hybridization signals may be used; 
conveniently, the cells are pre-labeled with radionucleotides such ttiat the gene transcripts provide 

10 a radioactive signal that can be detected in the hybridization pairs. The matrix-wide signal profile 
of the dmg-stimulated oeDs is then con^ared with a matrix-wide signal profile of negative control 
cells to obtain a specific drag response profile. 

The invention also provides means for conq)uter-based qualitative analysis of candidate 
drags and unknown conq)ounds. A wide variety of reference response profiles may be generated 

15 and used in such analyses. For cxsasxplc, the response of a matrix to loss of function of each 
protein or gene or RNA in the cell is evaluated by introducing a dominant allele of a gene to each 
reporter ceH and determining the response of the reporter as a function of the mutation. For this 
purpose, dominant mutations are preferred but other types of mutations can be used. Dominant 
mutations are created by in vitro mutagenesis of cloned genes followed by screening in diploid 

20 cells for dominant mutant alleles. 

In an alternative enibodiment, the reporter matrix is developed in a strain deficient for the 
UPF gene function, wherein the majority of nonsense mutations cause a dominant phenotype, 
allowing dominant mutations to be constructed for any gene. UPFl encodes a protein that causes 
the degradation of MRNA's that, due to mutation, contain premature termination codons. In 

25 mutants lacking UPFl function most nonsense mutations encode short truncated protein 
firagments. Many of these interfere with normal protein function and hence have dominant 
phenotypes. Thus in a upf 1 mutant, many nonsense alleles behave as dominant mutations (see, 
e.g. Leeds, P. et al. (1992) Molec. Cell Biology. 12:2165-77). 

The resultant data identify genetic response profiles. These data are sorted by individual 

30 gene response to determine the specificity of each gene to a particular stimulus. A weighting 
matrix is established which weights the signals proportionally to the specificity of the 
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comesponding reporters. The weighting matrix is revised dynamically, incoiporating data from 
eveiy screen. A gene regulation function is then used to constmct tables of regulation identifying 
which cells of the matrix respond to i^ch mutation in an indexed gene, and which mutations 
affect which cells of the matrix. 
5 Response profiles for an unknown stunulus (e.g. new chemicals, unknown compounds or 

unknown mixtures) may be analyzed by comparing the new stunulus response profiles with 
response profiles to known chemical stimuli. Such comparison analyses generally take the form 
of an indexed report of the matches to the reference chemical response profiles, ranked according 
to the weighted value of each matching reporter. If there is a match (i.e. perfect score), the 

10 response profile identifies a stimulus with the same target as one of the known compounds upon 
which the response profile database is built. If the response profile is a subset of cells in the 
matrix stimulated by a known conq>ound, the new compound is a candidate for a molecule with 
greater specifici^ than the reference compound. In particular, if the reporters responding 
uniquely to the reference chemical have a low weighted response value, the new compound is 

IS concluded to be of greater spedficiQr. Alternatively, if the reporters responding uniquely to the 
reference conopoimd have a hiigji wdghted response value, the new compound is concluded to be 
active downstream in the same pathway. If the output overlaps the response profile of a known 
reference conq>ound, the overlap is sorted by a quantitative evaluation with the weighting matrix 
to yield common and unique reporters. The unique reporters are then sorted against the 

20 regulation tables and best matches used to deduce the candidate target. If the response profile 
does not either overlap or match a chemical response profile, then the database is inadequate to 
infer function and the response profile may be added to the reference chemical response profiles. 

The response profile of a new chemical stimulus may also be conq)ared to a known 
genetic response profile for target gene(s). If there is a match between the two response profiles, 

25 the target grae or its functiorial pathway is the presuixq>tivetar^ If thechemical 

response profile is a subset of a genetic response profile, the target of the drug is downstream of 
the mutant gene but in the same pathway. If the chemical response profile includes as a subset 
a genetic response profile, the target of the chemical is deduced to be in the same pathway as the 
target gene but upstream and/or the chemical affects additional cellular components. If not, the 

30 chemical response profile is novel and defines an orphan pathway. 

While described in terms of cells comprising reporters under the transcriptional control 
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of endogenous regulatory regions, there are a number of other means of practicing the invention. 
For exaix^le, each unit of a genome reporter matrix reporting on gene expression might confine 
a diff(^nt oligonucleotide probe capable of hybridizing with a corresponding different reporter 
transcript. Alternatively, each unit of a matrix reporting on DNA-protein interaction might 
confine a cell having a first construct of a reporter gene operatively joined to a targeted 
transaction factor binding site and a second hybrid construct encoding a transcription activation 
domain fused to a different structural gene, i.e. a one-dimensional one-hybrid system matrix. 
Alternatively, each unit of a matrix reporting on protein-protein interactions might confine a cell 
having a first constract of a reporter gene operatively joined to a targeted transcription factor 
binding site, a second hybrid constmct encoding a transcription activation domain fused to a 
differmt constitutionally expressed gene and a third constract encoding a DNA-binding domain 
fiised to yet a different constitutional^ expressed gene, i.e. a two-dimensional two-hybrid system 
matdx. 

The following examples are offered by way of illustration and not by way of limitation. 

EXAMPLES 

1 . Transcriptional promoter-reporter gene matrix 

A) Construction of a physical matrix stimulated with the drug mevinolin Oovastatin, Meracon). 

Mevinolin is a con^und known to inhibit cholesterol biosynthesis. Initially, the maximal 
non-toxic (as measured by cell growth and viability) concentration of mevinolin on the reporter 
cells was determined by serial dilution to be 25 ug/mL To produce a mevinolin-stimulated matrix, 
each well of 60 microtiter plates is filled with 100 ul culture medium containing 2S ug/ml 
mevinolin in a 2% ethanol solution. An aliquot of each member of the reporter matrix is added 
to each well allowing for a dilution of sq)proximately 1:100. The cells are incubated in the 
medium until the turbidity of the average reporter increases by 20 fold. Each well is &en 
quantified for turbidity as a measure of growth, and is treated with a lysis solution to allow 
measurement of p*galactosidase from each fusion. 

B) Generation of an output signal matrix data stracture. 

Both the turbidity and the B-galactosidase are read on conmiercially available microtiter 
plate readers (e.g. BioRad) and the data captured as an ASCII file. From this file, the value of 
the individual cells in the riq>orter matrix to a 2% ethanol solution in the reference response profile 
is subtracted. The differrace corresponds to the ri^vinolin response profile. This file is converted 



13 



/ 



wo 97/06277 PCT/US96/12956 

in the conqiuter to a table indexed by the response of each cell to the inhibitor. For exanq>le, the 
genes encoding acetoacetylrCoA thiolase and squalene synthase increase 10 fold, while SIR3, and 
LEU2, two unrelated genes, remain unchanged. The response of the reporter matxix to other 
compounds is similarly determined and stored as output response profiles. 
S C) Comparison of Signal Matrix data stracture with a Signal Matrix database. 

A physical matrix is constructed as describe above except the mevinolin is replaced with 
an unknown test conq>ound The resultant response profile is compared to the response profiles 
of a library of known bioactive con^unds and analyzed as described above. For example, if the 
test compound ouq>ut profile shows both acetoacetyl-CoA thiolase and squalene synthase gene 

10 induced, then the ou^ut profile matches that expected of an inhibitor of cholesterol synthesis. 
If the response profile has fewer other cells affected than the response profile to mevinolin, the 
unknown compound is a candidate for greater specificity. If the response profile of the new 
chemical affects fewer other reporters than the response profile to mevinolin, and if the other 
reporters affected by mevinolin have a lower weighted value, then the compound is a candidate 

15 for greato: specificity. If the response profile has more different cells affected than the response 
profile to mevinolin, then the compound is a candidate for less specificity. In the case where 
mixtiures of compounds are tested, the highest weighted responses are evaluated to determine 
whether they can be deconvolved into the response profile of two different compounds, or of two 
different genetic response profiles. 

20 2. Reporter transcr^t-oligonucleotide hybridization probe matrix: Constraction of stimulated 
physical matrix and generation of an output signal matrix data stracture. 

Unlabeled oligonucleotide hybridization probes conq)lemmtary to the mRNA transcript 
of each yeast gene are arrayed on a silicon substrate etched by standard techniques (e.g. Fodor 
et aL (1991) Science 252, 767). The probes are of length and sequence to ensure specificity for 

25 the corresponding yeast gene, ^ically about 24-240 nucleotides in length. 

A confluent HeLaoell culture is treated with 15 ug/knl mevinolin in 2% ethanol for 4 hours 
while maintained in a humidified 5% COj atmosphere at 37*'C. Messenger RNA is extracted, 
reverse transcribed and fiuorophore-labded according to standard methods (Sambrook et al.. 
Molecular Cloning, 3rd ed.). The resultant cDNA is hybridized to the array of probes, the array 

30 is washed free of unhybridized labeled cDNA, the hybridization signal at each unit of the array 
quantified using a confocal microscope scanner (instraments by Molecular Devices and 
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Afiyxiietrix), and the resultant matrix response data stored in digital form. 
3 . Two-dimensional two-hybrid matrix 

A) Constraction of stimulated physical matrix. 

The two-dmensional two-hyhrid (see. e.g. Chien et al. (1991) PNAS, 88, 9578)matrix is 
5 designed to screen for conqxjunds that specifically affect the interaction of two proteins, e.g. the 
interaction of a human signal transducer and activator of transcription (STAT) with an interleukin 
receptor. Two hybrid fusions are generated by standard methods: each strain contains a portion 
of the targeted human STAT gene, fused to a portion of a yeast or bacterial gene encoding a DNA 
binding domain (e.g. GAL4:1-147). The DNA sequence recognized by that DNA binding domain 
10 (e.g. UASq) is inserted in place of the enhancer sequence 5* to the selected reporter (e.g. lacZ). 
The strain also contains another fusion consisting of an intracellular portion of the targeted 
receptor gene whose protein product interacts with the STAT. This receptor gene is fused with 
a gene fragment encoding a transcriptional activation domain (e.g. GAL4:768-881). 

B) Generation of signal matrix data stracture. 

IS Both the turbidity and the galactosidase are read on commercial microtiter plate readers 

(BioRad) and the data captured as an ASCII file. 

C) Comparison of signal matrix data structure with database. 

Data are analyzed for those compounds that block the interaction of the two human 
proteins by reducing the signal produced from the reporter in the various strains containing pairs 
20 of human proteins. The output is processed to identify compounds with a large inqpact on a 
reporter whose expression is dependent on a single pair of interacting human proteins. An 
inverted weighting matrix is used to evaluate these data as preferred compounds do not affect 
even the least specific reporters in the matrix. 

An publications and patent applications cited in this specification are herein incorporated 
25 by reference as if each individual publication or patent implication were specifically and 
individual^ indicated to be incorporated by reference. Although the foregoing invention has been 
described in some detail by way of illustration and exanq>le for purposes of clarity of 
understanding, it will be readily apparent to those of ordinary sidll in the art in light of the 
teachings of this invention that certain changes and modifications may be made thereto without 
> 30 departing from the spirit or scope of the appended claims. 
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WHATTSa.ATMEPIS; 

1. A method for estimatmg the physiological spedficiQr of a candidate drug, comprising 
steps: 

(a) detecting reporter gene product signals firom each of a plurality of different, separately 
5 isolated cells of a target organism, wherein each of said cells contains a recombinant constract 

conq)rising a reporter gene operativefy linked to a different endogenous transcriptional regulatory 
element of said target organism such that said transcriptional regulatory element regulates the 
expression of said reporter gene, wh^in said plurality of cells comprises an ensemble of the 
transcriptional regulatory elements of said organism sufficient to model the transcriptional 
10 responsiveness of said organism to a dmg; 

(b) contacting each of said cells with a candidate dmg under conditions wherein said cells 
maintain homeostasis; 

(c) detecting reporter gene product signals from each of said cells; 

(d) cornpsaoDg said reporter gene product signals from each of said cells before and after 
IS contacting each of said cells with said candidate dmg to obtain a drag response profile; 

whoein said drag response profile provides an estimate of the physiological specificity of 
said candidate drag. 

2. A method according to claim 1, said ensemble comprising a majority of aU different 
20 transcriptional regulatory elements of said organism. 

3. A method according to claim 1, said ensemble comprising essentially all different 
transcriptional regulatory elements of said organism. 

25 4. A method according to claim 1 , said drag being a candidate human therapeutic. 

5. A method according to claim 1, wherein said cells are yeast cells. 

6. A method according to claim 1, wherein said cells are bacterial cells. 

30 

7. A method according to claim 1 , wherein said cells are human cells. 
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8. A method according to claim 1, wherein the rq)orter gene is the lacZ gene, the suc2 gene, 
or a gene encoding a green fluorescent protein. 

9. A method for estimating the physiological specificity of a candidate drug, comprising 
5 steps: 

(a) contacting a cell isolated from a selected organism with a candidate drug under 
conditions wherein said drug effects a change in gene transcription in said cell to maintain 
homeostasis; 

(b) isolating gene transcripts from said cells; 

10 (c) contacting said gene transcripts with an ordered matrix of hybridization probes, each 

probe specific for a different one of said transcripts, under conditions wherein each of said gene 
transacts iQifaridizes with a corresponding one of said probes to form hybridization pairs, wherein 
said ordered matrix of probes provides, in aggregate, complements for an ensemble of genes of 
said organism sufficient to model the transcriptional responsivraess of said organism to a drag; 

15 (d) specifically detecting a hybridization signal at each hybridization pair to obtain a 

matrix-wide signal profile; 

(e) con^aring said matrix-wide signal profile of drag-stimulated cells with a matrix-wide 
signal profile of negative control cells to obtain a specific drag response profile, wherein said drag 
response profile provides an estimate of the physiological activity of said candidate drag. 

20 

10. A diagnostic kit for estimating the physiological specificity of a candidate drag, said kit 
conqirising a plurality of cells isolated from a selected organism, wherein each of said cells 
contains a recoznbinant constract con^rising a reporter gene operatively linked to an endogenous 
transcrq>tional regulatory elmient of said target organism such that said transcriptional regulatory 

25 element regulates the expression of said reporter gene, wherein said plurality of cells comprises 
an ensenible of the transcriptional regulatory elements of said organism sufficient to model the 
transcriptional responsiveness of said organism to a drag. 

11. A method for estimating the physiological specificity of a candidate drag affecting the 
30 interaction of two proteins, comprising steps: 

(a) contacting a plurality of cells isolated from a selected organism with a candidate drag 
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under conditions wheiein said cells maintain honosostasis, wherein each of said cells contains two- 
faybrul leconihinant constructs conpising first, second and third recombinant constracts, wherein 
the first recombinant constmct coniprises a rqjorter gene c^ratively linked to a transcription 
factor binding site, the second recombinant construct encodes a transcription factor activation 
5 domain fused to a first structural gene, and the third recombinant construct encodes a domain 
capable of specifically binding said transcription factor binding site fused to a second, different 
structural grae, wherein the translation products of said second and third recombinant constructs 
are enable of interacting thou^ ti^ translation products of said first and second structural genes 
to activate transcription of said reporter gene, and said plurality of ceUs comprises an ensemble 
10 of the transa:q>tk>nal regulatory elements of said organism sufficient to model the transcriptional 
responsiveness of said organism to a drag; 

(b) detecting reporter gene product signals from each of said cells; 

(c) conpaiing said reporter gene product signals from each of said cells before and after 
contacting each of said cells with said candidate drag to obtain a drag response profile; 

IS wherein said drag response profile provides an approximation of the physiological activity 

of said candidate drag. 

12. A method according to claim 1 1, wherein a decrease in said gene product signal indicates 
that the candidate drag inhibits the interaction of the proteins and an increase in said gene product 
20 signal indicates that the candidate drag enhances the interaction of the proteins. 



13. A method for making modifications of candidate drags to obtain modified drags having 
altered the physiological activity: 

(a) determining the physiological activity of a candidate drag using the method of claim 

25 1; and 

(b) modifying the stracture of the candidate drag to make amodified drag; 

(c) determining the plq^togical activity of said modified drag using the method of claim 
1, wherein the physiological activity of said modified drag is different than the physiological 
activity of said candidate drag. 
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